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Supercomputer Fugaku Project 


FUJITSU 



RIKEN and Fujitsu are currently developing Japan's 
next-generation flagship supercomputer, the successor to 
the K computer, as the most advanced general-purpose 
supercomputer in the world 
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■ RIKEN and Fujitsu announced that manufacturing started in March 2019 

■ RIKEN announced on May 23, 2019 that the supercomputer is named "Fugaku" 
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Goals and Approaches for Fugaku 


FUJITSU 


Goals 
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High application 
performance 


d3? 


Good usability 
and wide range of uses 



Keeping application 
compatibility 


| > Tep > Performance 

□ Performance Targets 


□ Geometric Mean of Performance 


of9TargetAppfcca&ons *41 


RIKEN announced predicted performance: 

■ More than 100x+ faster than K computer for GENESIS and 
NICAM+LETKF 

■ Geometric mean of speedup over K computer in 9 Priority Issues 
is greater than 37x+ 

https://postk-web.r-ccs.riken.jp/perf.html 
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Goals and Approaches for Fugaku 
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High application 
performance 
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Good usability 
and wide range of uses 
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Keeping application 
compatibility 
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■ Approaches 


Develop 

Achieve 

1. High-performance Arm CPU A64FX in HPC and Al areas 

- High performance in real applications 

2. Cutting-edge hardware design 

- High efficiency in key features for Al 

3. System software stack 

L_ 

applications 

_ a 
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1. High-Performance Arm CPU A64FX in HPC and Al Areas FUJITSU 



■ Architecture features 


ISA 

Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension) QflTI 

SIMD width 

512-bit 

Precision 

FP64/32/16, INT64/32/16/8 

Cores 

48 computing cores + 4 assistant cores (4 CMGs) 

Memory 

HBM2: Peak B/W 1,024 GB/s 

Interconnect 

TofuD: 28 Gbps x 2 lanes x 10 ports 


Peak performance (Chip level) 
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2. Cutting-edge Hardware Design 

■ 1 PFIops by Fugaku and K computer 

w 

I 



Fugaku 


lx rack including SSDs 


384 


1.1 m 2 (0.8 m x 1.4 m) 


Scalable design 












CPU CMU BoB 


Shelf 


Rack 


Fugaku 


Nodes 

1 

2 

16 

48 

384 

150k+ 

Performance 

[Flops] 

2.7 T+ 

5.4 T+ 

43 T+ 

129T+ 

1 P+ 

400 P+ 


FUJITSU 



K computer 


80x compute racks & 20x disk racks 


8,160 


128 m 2 (4 m x 32 m) 


■ CMU(CPU Memory Unit) 


■ 100% direct water 
cooling 

■ 3x QSFP for AOC(Active 
Optical Cables) 

■ Single-sided blind 
mate connectors for 
electrical signals and 
water 



Water 

coupler 

PCIe 

connector 

TofuD 

connector 


TofuD 

cables 
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3. System Software Stack Fujitsu 

■ Fujitsu developing system software in collaboration with RIKEN 

■ Fujitsu Technical Computing Suite implementing development and execution environments 
with great usability on large-scale system 


Fugaku applications 


Fujitsu Technical Computing Suite / RIKEN developing system software 


Management software 


System management 
for high availability & power 
saving operation 


Job management for higher 
system utilization & power 
efficiency 


File system 


Programming environment 


FEFS 

Lustre-based distributed file 
system 


LUO 

NVM-based file I/O accelerator 


XcalableMP 


tm 


Debugging and 
tuning tools 


$ 


MPI 

(Open MPI, MPICH) 


Compilers 
(C, C++, Fortran) 


Math. libs. 


Linux OS/ McKernel (Lightweightkernel) 


Fugaku system hardware 


Fugaku Under development 


w/ RIKEN 
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FUJITSU 


Fujitsu developing system software in collaboration with RIKEN 

■ Fujitsu Technical Computing Suite implementing development and execution environments 
with great usability on large-scale system 


Fugaku applications 


Fujitsu Technical Computing Suite / RIKEN developing system software 


Exploits hardware performance by compiler 
optimizations such as SVE vectorization 

Supports new programming language 
standards and data type FP16 


Programming environment 


XcalableMP 



Debugging and 
tuning tools 


MPI 

(Open MPI, MPICH) 


Compilers 
(C, C++, Fortran) 


Math. libs. 
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Linux OS/ McKernel (Lightweightkernel) 
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0^9 High Performance in Real Application Fujitsu 

■ WRF: Weather Research and Forecasting model (v3.8.i) 

■ Vectorizing loops including IF statements is key optimization 

■ Himeno Benchmark (Fortran90, size: XL) 

■ Stencil calculation to solve Poisson's equation by Jacobi method 


High memory B/W and 
long SIMD length work 
effectively 



WRF v3.8.1 (48-hour, 12km, CONUS) on 48 cores 

2 
1.5 
1 

0.5 
0 

Sky lake A64FX 

(Xeon Platinum 8168) 1 CPU 

2 CPUs with source tuning 



* Normalized by the average elapsed time for timestep of Skylake 



Himeno Benchmark (Fortran90) 



Skylake(Xeon FX100 A64FX SX-Aurora Tesla VI00 
Platinum 8168) 1CPU 1CPU TSUBASA 1GPU+ 

2CPUs 1VE+ 


tPerformance evaluation of a vector supercomputer SX-aurora TSUBASA 
https://dl.acm.org/citation.cfm?id=3291728 
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High Efficiency in Key Features for Al Applications FUJITSU 


■ High FP16 & INT8 peak performance and high memory peak B/W 

FP16 performance: 1 0.8+ TOPS, > 90%@HGEMM 
INT8 performance: 21.6+ TOPS in partial dot product 

Memory b/w : 1,024 GB/s, > 80%@STREAM Triad 

32-bit 


INT8 partial dot product 

C = I (Ai x Bi) + C 

8-bit 8-bit 8-bit 8-bit 

i 1 —> i 1 ii 1 \ i 1 \ 


AO Al A2 A3 


X X X X 



Functions contributing to key features in Al fields 



2x 512-bit wide SIMD pipelines per core for FP16 and INT8 
High memory B/W and calculation throughput 


Compilers & libraries 


(Cy 


Vectorization and software pipelining 

FP16 as data type of programming language (e.g., real (kind=2) in Fortran) 
Mathematical Library for HGEMM 
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Future Plans 
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